Feature selection
What is Feature Selection
= Select a subset of all features for model training, usually in data preprocessing.
- Advantages
- avoid the curse of dimensionality
- improves the predictive performance and interpretability of models
- shorten training time
- prevent overfitting
- reduce generalization error
- Domain knowledge is important!!!
Feature Selection methods
There are some of the techniques used for feature selection in data analysis
Method | Pros | Cons |
---|---|---|
Intrinsic methods | fast and no external selection tool is needed; provide a direction connection between feature selection and the object function | model-dependent; the choice of models is limited |
Filter methods | simple and fast; can capture the large trends in the data | tend to select redundant features; ignore relationships among features |
Wrapper methods | search for a wider variety of features | tend to overfit; slow |
Intrinsic methods
- embedded methods or implicit methods: model already can handle it, do not require extra methods
- examples
- tree-based models: a feature is not used in any split already means it is independent of the target variable
- regularizaton models (Lasso regression Cost Functions#^ed4ab0, Ridge Regression Cost Functions#^aef45a): only keep features with non-zero coefficients
Filter methods
- select features that correlate well with target variables
- examples
- univariate statistical analysis: select features with higher correlations with target variables
- feature importance-based: select features with higher Feature Importance scores
- via coefficient (linear regression, logistic regression) - but this may remove non-linear relationships
- via impurity-based feature importances (tree-based models)
Wrapper methods
- iterative process that repeatedly add subjects of feature to the model
- examples
- sequential feature selection (SFS)
- Forward SFS: start with 0 feature -> sequentially add feature
- select a significance level (SL, e.g. SL=0.05)
- fit all simple regression models. Select the one with the lowest P-value
- keep this variable and fit all possible models with one extra predictor added to the one(s) you already have.
- consider the predictor with the lowest P-value. If P < SL, go to step 3, otherwise go to FIN and keep the previous model
- Backward SFS ( (Recursive Feature Elimination, fastest): start with all features-> sequentially remove feature
- select a significance level (SL, e.g. SL=0.05) 2. fit the full model with all possible predictors
- consider the predictor with the highest P-vales. If P > SL, go to step 4, otherwise go to FIN
- remove the predictor
- fit model without this variable
- return to step 3
- Forward SFS: start with 0 feature -> sequentially add feature
- Bidirectional Elimination (stepwise regression) = combination of Backward Elimination and Forward Selection
- select a significance level to enter and stay in the model (SL, e.g. SL=0.05)
- perform the next step of forward selection (new variables must have P < SL_enter to enter)
- perform all steps of backward elimination (old variables must have P < SL_enter to stay)
- repeat step 2 and 3, until no new variables can enter and no old variables can exit
- sequential feature selection (SFS)